32 research outputs found

    Multilingual Neural Machine Translation: the case-study for Catalan, Spanish and Portuguese Romance Languages

    Get PDF
    La traducció automàtica és la tasca de traduir automàticament unidioma a un altre. Aquest projecte avalua el rendiment dels últims siste-mes d'aprenentatge profund en la tasca de traducció d'idiomes similars.Avaluarem la traducció entre Català, Castella i Portuguès, que són llen-gües romàniques, per veure com l'arquitectura del Transformer realitzala tasca. També farem servir diverses tècniques per millorar la traduccióentre els idiomes. Primer, utilitzarem model multilingües que permetenfer transferència de coneixement entre idiomes i poder fer traduccionszero-shot. Després aplicarem backtranslation per poder fer ús dels textsmonolingües i millorar les traduccions del sistema. Per últim milloraremla traducció de domini específic fent ús de fine tuning.Machine translation is the task of automatically translating one lan-guage into another. This project aims to evaluate the performance ofstate-of-the-art Deep Learning systems on similar language translation.We will to evaluate the translation between Catalan, Spanish, and Por-tuguese, which are Romance languages, and see how the Transformer ar-chitecture performs in this task. We will additionally make use of differenttechniques to improve the translation between these languages. First ofall, we will make use of multilingual models that allow for transfer-learningas well as zero-shot translations. Secondly, we will apply the backtransla-tion technique to make use of the monolingual data and better the systemtranslations. Lastly, we will improve the specific domain data using finetuning

    Hyperparameter optimization using agents for large scale machine learning

    Get PDF
    Machine learning (ML) has become an essential tool for humans to get rational predictions in different aspects of their lives. Hyperparameter algorithms are a tool for creating better ML models. The hyperparameter algorithms are an iterative execution of trial sets. Usually, the trials tend to have a different execution time. In this paper we are optimizing the grid and random search with cross-validation from the Dislib [1] an ML library for distributed computing built on top of PyCOMPSs[2] programming model, inspired by the Maggy [3], an open-source framework based on Spark. This optimization will use agents and avoid the trials to wait for each other, achieving a speed-up of over x2.5 compared to the previous implementation

    DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication

    Full text link
    Metrics for set similarity are a core aspect of several data mining tasks. To remove duplicate results in a Web search, for example, a common approach looks at the Jaccard index between all pairs of pages. In social network analysis, a much-celebrated metric is the Adamic-Adar index, widely used to compare node neighborhood sets in the important problem of predicting links. However, with the increasing amount of data to be processed, calculating the exact similarity between all pairs can be intractable. The challenge of working at this scale has motivated research into efficient estimators for set similarity metrics. The two most popular estimators, MinHash and SimHash, are indeed used in applications such as document deduplication and recommender systems where large volumes of data need to be processed. Given the importance of these tasks, the demand for advancing estimators is evident. We propose DotHash, an unbiased estimator for the intersection size of two sets. DotHash can be used to estimate the Jaccard index and, to the best of our knowledge, is the first method that can also estimate the Adamic-Adar index and a family of related metrics. We formally define this family of metrics, provide theoretical bounds on the probability of estimate errors, and analyze its empirical performance. Our experimental results indicate that DotHash is more accurate than the other estimators in link prediction and detecting duplicate documents with the same complexity and similar comparison time

    El Tractament Integrat de les Llengües i l’aprenentatge de les ciències a l’Educació Secundària Obligatòria

    Get PDF
    El govern autonòmic de les Illes Balears ha impulsat un Decret de tractament integrat de llengües que pretén, entre altres finalitats, aconseguir un ple domini de les dues llengües oficials, el català i el castellà, juntament amb una competència adequada en llengua estrangera, preferentment l’anglès, mitjançant la utilització d’aquesta darrera com a llengua vehicular a diverses matèries del currículum. L’objectiu del present treball és posar de manifest alguns del efectes que pot tenir l’ensenyament i aprenentatge en anglès de les ciències de la naturalesa (Biologia, Geologia, Física i Química) a l’Educació Secundària Obligatòria (ESO) a les Illes Balears. Els autors analitzen, per un costat, algunes dades sobre la competència en comunicació lingüística de l’alumnat d’ESO en aquesta comunitat i, per l’altre, revisen les aportacions de diversos experts en didàctica de les ciències sobre el paper crucial i específic del llenguatge i l’argumentació en l’aprenentatge de les ciències, i la seva vinculació amb el raonament científic i la construcció de les idees científiques. El treball es clou amb algunes consideracions sobre les condicions necessàries per a la introducció de l’anglès com a llengua de comunicació a la classe de ciències a l’ESO, sense perjudici de l’adquisició d’un ple domini de les dues llengües oficials de la comunitat.El gobierno autonómico de las Islas Baleares ha impulsado un decreto de tratamiento integrado de lenguas que pretende, entre otras finalidades, conseguir un pleno dominio de las dos lenguas oficiales, el catalán y el castellano, junto con una competencia adecuada en lengua extranjera, preferentemente inglés, mediante el uso de esta última com lengua vehicular en diversas materias del currículo. El objetivo del presente trabajo es poner de manifiesto algunos de los efectos que puede tener la enseñanza y el aprendizaje en inglés de las ciencias de la naturaleza (Biología, Geología, Física y Química) en la Educación Secundaria Obligatoria (ESO) en las Islas Baleares. Los autores analizan, por un lado, algunos datos sobre la competencia en comunicación lingüística del alumnado de ESO en esta comunidad y, por el otro, revisan las aportaciones de diversos expertos en didáctica de las ciencias sobre el papel crucial y específico del lenguaje y la argumentación en el aprendizaje de las ciencias y su vinculación con el razonamiento científico y la construcción de las ideas científicas. El trabajo concluye con algunas consideraciones sobre las condiciones necesarias para la introducción del inglés como lengua de comunicación en la clase de ciencias en la ESO, sin perjuicio de la adquisición de un pleno dominio de las dos lenguas oficiales de la comunidad.The government of the Balearic Islands has launched a decree about the comprehensive treatment of languages. Its aim is to achieve proficiency in Catalan and Spanish, the two official languages, along with a sufficient competence in a foreign language, preferably English, through its use as a language of instruction in several subjects from the curriculum. This paper aims at demonstrating some of the effects that teaching in English may have on natural sciences subjects (Biology, Geology, Physics and Chemistry) in Compulsory Secondary Education (ESO) in the archipelago. On the one hand, the authors analyze some of the data related to linguistic competence among students. On the other hand, they revise the contribution of several experts on science teaching regarding the key role played by language and the argument in sciences learning, as well as their relationship with scientific reasoning and the construction of scientific ideas. This paper ends with some considerations on the necessary conditions for the introduction of English as a language of instruction in the science subjects of ESO, without prejudice concerning the acquisition of proficiency in the two official languages of the autonomous community

    Impacts of Use and Abuse of Nature in Catalonia with Proposals for Sustainable Management

    Get PDF
    This paper provides an overview of the last 40 years of use, and in many cases abuse, of the natural resources in Catalonia, a country that is representative of European countries in general, and especially those in the Mediterranean region. It analyses the use of natural resources made by mining, agriculture, livestock, logging, fishing, nature tourism, and energy production and consumption. This use results in an ecological footprint, i.e., the productive land and sea surface required to generate the consumed resources and absorb the resulting waste, which is about seven times the amount available, a very high number but very similar to other European countries. This overexploitation of natural resources has a huge impact on land and its different forms of cover, air, and water. For the last 25 years, forests and urban areas have each gained almost 3% more of the territory at the expense of agricultural land; those municipalities bordering the sea have increased their number of inhabitants and activity, and although they only occupy 6.7% of the total surface area, they account for 43.3% of the population; air quality has stabilized since the turn of the century, and there has been some improvement in the state of aquatic ecosystems, but still only 36% are in good condition, while the remainder have suffered morphological changes and different forms of nonpoint source pollution; meanwhile the biodiversity of flora and fauna remains still under threat. Environmental policies do not go far enough so there is a need for revision of the legislation related to environmental impact and the protection of natural areas, flora, and fauna. The promotion of environmental research must be accompanied by environmental education to foster a society which is more knowledgeable, has more control and influence over the decisions that deeply affect it. Indeed, nature conservation goes hand in hand with other social and economic challenges that require a more sustainable vision. Today's problems with nature derive from the current economic model, which is environmentally unsustainable in that it does not take into account environmental impacts. Lastly, we propose a series of reasonable and feasible priority measures and actions related to each use made of the country's natural resources, to the impacts they have had, and to their management, in the hope that these can contribute to improving the conservation and management of the environment and biodiversity and move towards sustainability

    Vigilància epidemiològica dels casos greus hospitalitzats confirmats de grip. Xarxa sentinella PIDIRAC (Catalunya 2010-2015)

    Get PDF
    Grip; Viigilància; Epidemiologia; Antivírics; VacunaGripe; Vigilancia; Epidemiología; Antivíricos; VacunaFlu; Surveillance; Epidemiology; Antivirals; VaccineIntroducció: el Pla d’informació de les infeccions respiratòries agudes a Catalunya (PIDIRAC) va incorporar la vigilància de casos greus hospitalitzats confirmats de grip (CGHCG) l’any 2009. L’objectiu de l’estudi és descriure les característiques clíniques, epidemiològiques i virològiques dels CGHCG registrats en 12 hospitals de la xarxa sentinella durant cinc temporades gripals. Mètode: la mostra consta dels CGHCG registrats durant les temporades que van de 2010-2011 a 2014-2015. La tècnica de confirmació emprada ha estat la PCR i/o l’aïllament viral en cultiu cel·lular a partir de mostra respiratòria. Resultats: es van registrar 1.400 CGHCG, dels quals un 33% van requerir ingrés a l’UCI i un 12% van ser èxitus. La mitjana d’edat dels casos va ser de 55,2 anys (DE: 26,7 anys), amb un rang de 0-101 anys. Un 70,8% no estaven vacunats; un 87% van rebre tractament antiviral en el 80,4% i el 24% dels casos abans de 48 hores d’ingrés i d’inici de símptomes, respectivament. En el 87,7% dels casos es va identificar el virus de la grip A (37,9% A(H1N1)pdm09 i 29,3% A(H3N2)). Conclusions: la vigilància de CGHCG proporciona una estimació de la gravetat de les epidèmies estacionals de grip i permet identificar i caracteritzar grups de risc per adoptar mesures preventives (vacunació) i tractament antiviral precoç.Introducción: el Plan de información de las infecciones respiratorias agudas en Cataluña (PIDIRAC) incorporó la vigilancia de casos graves hospitalizados confirmados de gripe (CGHCG) el año 2009. El objetivo del estudio es describir las características clínicas, epidemiológicas y virológicas de los CGHCG registrados en 12 hospitales de la red centinela durante cinco temporadas gripales. Método: la muestra consta de los CGHCG registrados durante las temporadas que van de 2010-2011 a 2014-2015. La técnica de confirmación utilizada ha sido la PCR y/o el aislamiento viral en cultivo celular a partir de muestra respiratoria. Resultados: se registraron 1.400 CGHCG, de los cuales un 33% requirieron ingreso a la UCI y un 12% fueron exitus. La media de edad de los casos fue de 55,2 años (DE: 26,7 años), con un rango de 0-101 años. Un 70,8% no estaban vacunados; un 87% recibieron tratamiento antiviral, en el 80,4% y el 24% de los casos antes de 48 horas de ingreso y de inicio de síntomas, respectivamente. En el 87,7% de los casos se identificó virus de la gripe A (37,9% A(H1N1)pdm09 y 29,3% A(H3N2)). Conclusiones: la vigilancia de CGHCG proporciona una estimación de la gravedad de las epidemias estacionales de gripe y permite identificar y caracterizar grupos de riesgo para adoptar medidas preventivas (vacunación) y tratamiento antiviral precoz.Introduction: the Information Plan for Acute Respiratory Infections in Catalonia (PIDIRAC) incorporated the surveillance of severe confirmed influenza hospitalized cases (CGHCG) in 2009. The objective of the study is to portray the clinical, epidemiological and virological features of the CGHCG registered in 12 sentinel hospitals during 5 influenza seasons. Method: the sample consists of the CGHCG registered during 2010-2011 to 2014-2015 influenza seasons. The confirmation technique used was PCR and/or viral isolation in cell culture from respiratory sample. Results: 1400 CGHCG were registered, of which 33% required admission to ICU and 12% were exitus. The mean age of cases was 55.2 years (SD: 26.7 years), range of 0-101 years. 70.8% were not vaccinated; 87% received antiviral treatment, in 80.4% and 24% of the cases before 48 hours of admission and of beginning of symptoms, respectively. 87,7% of the cases identified influenza virus A (37,9% AH1N1pdm09, 29,3% AH3N2). Conclusions: surveillance of CGHCG provides an estimation of the severity of seasonal influenza epidemics allows to identify and characterize at-risk groups to adopt preventive measures (vaccination) and early antiviral treatment

    Performance testing of python libraries

    No full text
    The economic impact that proprietary ISA has on the market increased the interest in using Open Source ISA. More specifically RISC-V has been getting a lot of traction in the research community. The Open Source environment allowed for the development of software and hardware stack for Exascale computations. To take advantage of these resources and allow for executions of large and complex applications, task-based programming models have become more popular, thanks to their ease when handling composite workflows that require a large amount of data and computation time. Moreover, most of the applications being developed nowadays are related to Machine Learning in general, and in the context of RISC-V, there is a lot of interest in developing applications for Embedded Systems, where the framework of Hyperdimensional Computing is becoming more popular. For these reasons in we present this study in the scope of the MareNostrum Experimental Exascale Platform (MEEP), which is a flexible FPGA-based emulation platform designed for future RISC-V supercomputers. This study evaluates Machine Learning algorithms, classical Linear Algebra algorithms used for ML, and Hyperdimensional Computing Algorithms using COMPSs, a task-based programming model for the development of applications for distributed infrastructures, in different RISC-V boards being developed in the MEEP project and different mathematical libraries

    Feminisme (s) : [Editorial]

    No full text
    What binds all contemporary struggles for the advancement and consolidation of democracy, while guaranteeing and extending rights and freedoms is, undoubtedly, feminisms. Whether as a social movement or through the power of contributions from highly diverse authors, feminisms today are at the vanguard of thought on creating new spaces and broadening the field of action of ideas in the struggle for a fairer, more democratic future. As a plural, crosscutting movement, it questions the foundations of the current social and political structure, mobilising and proposing transformational alternatives at all levels

    Multilingual neural machine translation: case-study for Catalan, Spanish and Portuguese romance languages

    Get PDF
    In this paper, we describe the TALP-UPC participation in the WMT Similar Language Translation task between Catalan, Spanish, and Portuguese, all of them, Romance languages. We made use of different techniques to improve the translation between these languages. The multilingual shared encoder/decoder has been used for all of them. Additionally, we applied back-translation to take advantage of the monolingual data. Finally, we have applied fine-tuning to improve the in-domain data. Each of these techniques brings improvements over the previous one. In the official evaluation, our system was ranked 1st in the Portuguese-to-Spanish direction, 2nd in the opposite direction, and 3rd in the Catalan-Spanish pair.This work is supported in part by the Spanish Ministerio de Ciencia e Innovacion, through the postdoctoral senior grant Ramon y Cajal and by the Agencia Estatal de Investigacion through the projects EUR2019-103819, PCIN2017-079 and PID2019-107579RB-I00 / AEI / 10.13039/501100011033Peer ReviewedPostprint (published version
    corecore